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The  Center  for  Information  Systems  Research  (CISR)  is  a research  center 
located  and  managed  in  M.I.T.'s  Sloan  School  of  Management;  it  consists  of  a 
group  of  Management  Information  Systems  specialists,  including  faculty  members, 
full-time  research  staff  and  part-time  students.  The  Center's  general  research 
thrust  is  to  devise  better  means  for  designing,  generating  and  maintaining 
applications  software,  information  systems  and  decision  support  systems. 


Within  the  context  of  the  research  effort  sponsored  by  the  Naval  Electronics 
Systems  Command  under  contract  N00039-77-C-0255 , CISR  proposed  to  conduct  basic 
research  on  a systematic  approach  to  the  early  phases  of  complex  software 
systems  design,  one  of  the  main  goals  being  the  development  of  a well  defined 
methodology  aimed  at  explicitly  filling  the  gap  between  system  requirements 
and  program  specifications  that  characterizes  most  traditional  system  design 
strategies.  At  the  heart  of  such  a methodology  is  the  structuring  of  the 
initial  set  of  requirements  so  as  to  make  apparent  the  design  trade-offs  existing 
among  its  elements;  the  decomposition  of  that  set  into  subsets  of  strongly 
interdependent  requirements  which  would  define  a meaningful  framework  fcr  system 
design  is  the  main  focus  of  the  proposed  methodology.  The  research  project  is 
organized  so  as  to  investigate  the  following  four  areas: 

1)  Graph-like  representation  of  requirements  sets  and  suitable  decomposition 
techniques , 

2)  Design  and  development  of  a set  of  software  tools  to  support  the  set 
decomposition  activity, 

3)  Identification  of  a methodology  for  the  assessment  of  interdependencies 
among  requirements,  as  well  as  guidelines  for  the  interpretation  of  tne 
obtained  decompositions  and  for  the  coordination  of  design  subproblems, 
and 

4)  Experiment?!  application  of  the  methodology  and  supporting  tools  to  a 
specific  case,  with  emphasis  on  recommendadtions  for  their  practical  use 
and  comparison  with  more  traditional  design  approaches. 

This  document  focuses  on  the  activities  carried  out  at  CISR  to  investigate 
the  second  area  outlined  above. 


CONTRACT  N00039-77-C-0255 


Technical  Report  #2 
EXECUTIVE  SUMMARY 

This  Report  describes  an  interactive  software  package  which  incorporates 
the  decomposition  techniques  discussed  in  Technical  Report  #1,  along  with  a 
number  of  facilities  that  permit  checking  the  initial  set  for  consistency, 
convenient  user  interfacing,  evaluating  partitions,  saving  and  restoring  problems, 
and  performing  traditional  cluster  analysis  with  three  different  algorithms. 

The  different  capabilities  of  the  package  are  illustrated  by  means  of 
simple  examples,  and  its  usage  in  the  context  of  a non  trivial  decomposition 
problem  discussed  in  some  detail. 

The  package  was  written  in  FORTRAN  and  is  currently  operational  on  the 
Sloan  School's  PRIME  minicomputer. 
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SOLVING  DECOMPOSITION  PROBLEMS:  ALTERNATIVE 

TECHNQIUES  AND  DESCRIPTION  OF  SUPPORTING  TOOLS 


I . Introduction 

This  report  has  two  main  purposes: 

a)  To  describe  a software  package  which  incorporates  most  of  the  algor- 
ithms presented  in  [Andreu  771  and  allows  its  users  to  work  interactively; 

and 

b)  To  illustrate  the  usage  of  this  package  for  solving  a non-trivial 
graph  decomposition  problem. 
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2 . The  Package 

In  this  section  we  describe  the  facilities  available  in  a "graph 
decomposition"  software  package  that  incorporates  most  of  the  algorithms 
described  in  [Andreu  77]  and  permits  their  interactive  usage.  It  has 
been  implemented  on  the  PRIME  minicomputer  at  the  Sloan  School.  Here 
we  emphasize  the  basic  available  facilities;  their  use  is  illustrated 
by  means  of  simple  examples  in  Section  3.  A non-trivial  graph  decomposi- 
tion problem  is  then  solved  using  the  package  in  Section  4,  which  thus 
illustrates  its  usefulness  in  more  realistic  settings. 

2.1.  Basic  facilities 

The  package  was  designed  to  incorporate  the  algorithms  described 
in  [Andreu  77].  Basically,  it  allows  its  users  to  perform  the  following 
operations : 

(a)  Enter  the  structure  of  a graph  in  a convenient  manner;  the 
associated  adjacency  matrix  is  generated  by  the  package. 

(b)  Compute  a distance  matrix  for  the  graph  under  analysis,  using 

the  strategy  described  in  [Andreu  77  — Section  5.2.2].  A 
value  of  p = 1 is  assumed,  and  preclustering  (collapsing 
nodes)  is  automatically  performed  as  needed.  If  preclustering 
occurs,  the  user  has  two  options  for  computing  a distance 
matrix:  either  treating  the  groups  of  collapsed  nodes  as 

single  nodes,  or  not.  The  resulting  distance  matrix  can  be 
used  in  traditional  cluster  analytic  algorithms  to  partition 
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A brief  summary  of  these  commands  is  presented  in  Section  2.3 

below. 

2.2.  Structure  of  the  package 

The  package  is  characterized  by  a basic  design  feature  which 
must  be  understood  in  order  to  be  able  of  using  it  effectively.  It 
was  designed  to  work  with  a decomposition  problem  at  a time;  this 
implied  a concept  of  "currency"  regarding  the  different  entities  that 
may  exist  at  any  point  in  time',  and  to  which  the  user  can  refer  to. 

The  most  fundamental  entity  with  which  the  package  deals  is  a 
graph.  At  any  point  in  time,  there  is  a "current  graph"  on  whose 
decomposition  that  package  is  working.  A number  of  commands  are 
available  to  change  the  current  graph  as  needed.  This  currency 
concept  is  extended  to  other  entities  that  may  be  generated  by  the  package 
at  the  user  request.  Most  notably,  a current  adjacency  matrix,  distance 
matrix,  similarity  matrix  and  partition  may  exist.  Any  user  request 
which  requires  one  of  these  entities  is  processed  using  its  current 
version  if  it  exists;  an  error  message  is  issued  otherwise.  For 
example,  a request  to  print  the  similarity  matrix  results  in  its 
current  version  being  printed.  The  current  version  of  any  entity  is 
the  one  last  computed  or  specified. 

Furthermore,  since  the  package  allows  the  usage  of  the  different 
available  analytic  tools  in  no  predefined  order,  a notion  of  "current 
status"  is  also  needed.  The  reason  is  that  some  requests  can't  be 
processed  unless  others  have  been  processed  previously  for  the  current 

t graph.  For  example,  a cluster  analytic  algorithm  can't  be  used  unless 

a distance  matrix  is  available.  In  other  words,  the  order  in  which 
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the  user  can  issue  requests  is  not  completely  arbitrary;  rather,  the 
requests  that  can  be  processed  at  a given  point  in  time  are  a function 
of  what  has  been  done  so  far.  The  notion  of  "current  status"  is 
useful  to  summarize  what  is  currently  available  to  the  package  and, 
consequently,  whicli  commands  can  be  meaningfully  processed.  Changes 
in  the  current  status  may  come  about  after  the  processing  of  certain 
user  requests  that  change  the  entities  with  which  the  package  deals 
(e.g.,  distance  matrix,  etc.).  The  "current  status"  can  be  saved 
and  later  re-activated;  this  allows  the  user  to  explore  different 
solutions  to  a given  decomposition  problem  while  minimizing  processing. 

For  the  purpose  of  the  discussion  here,  it  is  sufficient  to 
think  of  the  current  status  as  a variable  that  can  take  five  values 
(status  levels),  with  the  following  associated  meanings: 


• Level  0: 

• Level  1: 

• Level  2: 

• Level  3: 

• Level  A: 


Nothing  is  available;  the  package  has  just  been 
invoked  and  no  graph  has  been  specified  yet; 

A graph  has  been  specified  and  the  associated 
adjacency  matrix  is  available; 

In  addition  to  what  is  available  at  level  1,  a 
graph  partition  has  been  computed  or  specified; 
A distance  matrix  is  also  available; 

A similarity  matrix  is  available. 


In  the  description  of  the  28  available  requests  contained  in 
the  next  section,  we  indicate,  for  each  of  them,  both  the  minimum 
status  level  required  for  the  request  to  be  processable  and  the 
status  level  resulting  after  its  execution.  Wherever  a request  is 
issued  which  can't  be  processed  at  the  time,  the  message  "UNABLE  TO 
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DO  IT"  is  printed  in  the  user's  terminal. 


2.3.  Request  command  summary 

The  package  is  invoked  by  entering  the  command  "RUN  *(1RDEC"^^. 
Upon  invocation,  the  user  is  asked  if  he  needs  a description  of  the 
available  command's  codes  (codes  are  more  ore  less  mnemonic  four  letter 


words).  If  the  answer  is  affirmative,  these  are  displayed  and  a 


prompt  ("REQ:")  issued  for  the  first  request,  as  shown  below. 

RUN  * OR DEC 
GO 

DC)  YOU  NERD  THE  Rf  QUEST  CODES  (YES  OR  NO). 

YES 


1 


A 


! 


! 


[.NOR 
RE  S 1 
PR  NO 
PROD 
I SOL 
NOLI 
Dl  l E 
DE  NO 
ADD 
CL  ON 
NORA 
D I NS 


D INN  : 

s r d i : 

her: 
ST  NA : 

inpa: 


HCM  l : 

homo: 

HO  M3 : 

i . v a i : 
pr  i k: 
prci  : 
pro  I : 
erst  : 
save: 
dee  i : 
GUITI 


LATER  GRAPH  data;  e or  get  current  graph* 

READ  IN  A ' I A U.;  PREVIOUSLY  SAVED  * 

PRINT  CUR  :,‘CN  I NUMBER  OF  NODES* 

PRINT  AD  I A CEE GY  MATRIX* 

CHEER  FOR  TOOL  .TED  NODES  * 

PE'INT  LINKS  A SCO  I A TED  WITH  EACH  NODE* 

DELETE  GRAPH  LINKS* 

.di  m r r:  g r • a p h n u d i " s < n o d r s r i: n u mb e r e d :>  * 

ADD  OR ARM  I INK  * 

COMPU  1 1.  PRECLUS  l ERINO * P = 1 ASSUMED* 

SPECIE  Y A I -API  [ T'lON* 

COMPUTE  DISTANCI  MATRIX  AND  PERFORM  PRECLUSTERING* 

CLUSTERS  CONSIDERED  AS  SINGLE  NODES*  P= ! ASSUMED* 

AO  "DIMS*  BUT  CLUSTERS  NOT  CONSIDERED  S INC EE  NODES* 

COMPUTI  DISTANC1  MATRI>  FROM  CURREN r SI MILAR I I V MA l E I X * 

'TER ATE  ON  D; STANCE  MATRIX* 

( DM PUT  I E.IMII.  ART  I Y MATRIX  ! ROM  CURRENT  DISTANCE  MATRIX* 

COMPU IF  AN  INITIAL  PARTITION  IRON  CURRENT  SIMILARITY  MATRIX* 
FT  EC!.  M TAG  I VALUE  WILL  Di  REOUESTE D , 

H I E R A R CHI  C A I C ' . U S T E 1 : 1 r J 0 (Ml  fH  0 D I > * 

HIERARCHICAL  CLUSTERING  (METHOD  2)* 

HIERARCH  I CAI  CLUSTERING  (METHOD  3)* 

E V A I.  U A 1 1 'HE.  v I.,  n T C I ..  U S T E I V T NO* 

PRINT  I iNE'  AMONG  CLUSTERS* 

DISPLAY  CURRENT  Cl.  LISTER  TNG* 

PRINT  CURRENT  D .1.  STANCE  MAI  FIX* 

PRINT  CURRENT  'IT Mil  ARMY  MATRIX* 

SAVE  CUKE!  Nl  S|  A I IJ<;* 

DE  El  l E A T fLE  I l i V I OUSE Y SAVED* 

oil  i r . 


RED : 


(1) 


In  the  Sloan  School's  PRIME  minicomputer. 
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At  this  point,  the  current  status  level  is  0. 

Each  command  is  described  at  some  length  below.  Examples  of 
their  usage  can  be  found  in  Section  3. 

1)  ENGR. 

Minimum  status  level  required  (MSL) : 0. 

Resulting  status  level  (RSL):  1. 

Upon  typing  "ENGR"  in  response  to  the  "REQ:"  prompt,  the  system  prompts 
the  user  to  enter  graph  data  by  printing  Graph  data  must  be 

entered  according  to  the  following  format: 

*N/ni;nj  , ,nj  , ...  / .../$  (embedded  blanks  allowed), 

where  "N"  is  the  number  of  nodes  in  the  graph,  and  a group  of  the  form 
"/ni;nj^,  ...  /"  indicates  that  there  are  links  between  node  number 
"ni"  and  nodes  "nj^",  "nj.,”,  etc.  "$"  is  used  to  indicate  end  of 
data.  Data  can  be  entered  in  more  than  one  line  as  long  as  the  last 
character  in  each  line  is  "/"  (except  for  the  last  line,  which  must 
end  with  the  system  will  keep  prompting  the  user  after  a line 

ending  with  "/"  is  entered).  Input  data  is  checked  for  correct  format 
and  to  ensure  that  no  node  number  is  greater  than  the  specified  number 
of  nodes  (the  system  assumes  that  nodes  are  assigned  the  consecutive 
numbers  1,  ...  , N);  if  an  error  is  detected,  the  user  is  notified  and 
the  input  process  must  begin  starting  at  the  first  line. 

Note  that  since  the  package  deals  with  undirected  graphs  whose 
adjacency  matrices  are  symmetric,  links  need  to  be  specified  only  in 
one  direction.  An  orderly  and  often  convenient  way  of  entering  data 
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is  to  specify  "/n  i ;nj  ^ ,n  j , ...  /"  groups  in  increasing  order  of  "ni" 
values;  only  "nj"  values  such  that  nj  > ni  need  then  to  be  included 
in  each  group  (main  diagonal  entries  of  the  adjacency  matrix  are 
automat ical ly  set  to  1). 

2)  REST. 

MSL:  0. 

RSL:  1,  2 or  3.  (See  "SAVE"  below.) 

The  user  will  be  asked  to  specify  a file  name  from  where  to  read 
graph  data;  such  a file  must  have  been  specified  previously  in  a 
"SAVE"  command. 

3)  PRNO. 

MSL:  1. 

RSL:  Unchanged. 

The  dimension  of  the  current  graph  is  printed. 

4)  PRAD. 

MSL:  1. 

RSL:  Unchanged. 

5)  ISOL. 

MSL:  1. 

RSL:  Unchanged. 

Any  isolated  node  in  the  current  graph  (a  node  with  no  links  associated 
with  it)  is  printed. 

6)  NOLK. 

MSL:  1. 

RSL:  Unchanged. 


- 9 - 


A listing  is  produced  where  each  node  in  the  current  graph  is  shown 
with  ttie  nodes  it  is  linked  to.  A measure  of  the  "link  density"  of 
the  current  graph,  the  "average  number  of  links  per  node"  is  also 
printed. 

7)  DKLK. 

MSL:  1. 

SSL:  1. 

The  user  is  prompted  (":")  to  enter  the  links  to  be  deleted.  Input 
data  format  is  as  that  for  "EN’GR",  with  the  exception  that  no  "*N" 
part  is  required.  Note  that  the  resulting  status  level  is  1,  since 
the  adjacency  matrix  is  modified. 


8)  DFNO. 

MSL:  1 . 

RSL:  1. 

The  user  is  prompted  to  enter  the  node  numbers  to  be  deleted.  Input 

data  formal  should  be  of  the  form 


/n-^  *n2  , . . . ,n  V , 

where  n^,...,n^  are  the  node  numbers  to  be  deleted.  Embedded  blanks 
are  allowed  and  data  can  be  entered  in  more  than  one  line  as  long  as 
each  one  ends  with  (the  last  one  should  end  with  "/").  If  the 

nodes  being  deleted  are  other  than  the  last  _i  nodes,  the  remaining 
ones  are  renumbered;  the  newly  assigned  numbers  are  printed  out. 
Attempts  to  remove  all  the  nodes  or  nodes  whose  number  is  greater  than 


the  current  number  of  nodes;  will  fail. 
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9)  ADLK. 

MSL:  1. 

RSL:  1. 

System  prompt  and  input  data  format  as  in  "DELK.". 

10)  CLON. 

MSL:  1. 

RSL:  2. 

A "partition"  is  identified  where  members  of  each  subgraph  are  collapsed 
nodes.  Usually  this  partition  won't  have  practical  interest  as  an 
end  result,  but  it  will  give  insight  about  the  structure  of  the  graph 
under  analysis. 

11)  NMPA. 

MSL:  1. 

RSL:  2. 

The  user  is  prompted  to  enter  a partition.  Input  data  format  should  be 
of  the  form 

/ n ^ ... /n  , 

where  the  node  numbers  between  a pair  of  consecutive  "/"  indicate 
a subgraph.  Embedded  blanks  are  allowed  as  well  as  multi-line  input, 
as  long  as  each  line  ends  with  "/"  or  Every  node  should  appear 

in  the  input  data,  otherwise  it  won't  be  accepted.  Note  that  the 
resulting  status  level  is  2:  otherwise,  since  the  partition  is  exter- 

nally specified,  it  might  create  inconsistencies  with  preclustering 
derived  from  collapsing  nodes.  Note  also  that  this  means  that  any 
available  distance  or  similarity  matrix  will  be  lost  after  executing 
this  command;  these  should  be  saved  for  future  reference  (see  "SAVE"). 

I 
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12)  DIMS. 

MSL : 1. 

RSL:  3. 

13)  DIMN. 

As  DIMS. 

14)  SIDI . 

MSL:  4. 

RSL:  3. 

15)  ITER. 

MSL:  3. 

RSL:  3. 

16)  SIMA. 

MSL:  3. 

RSL:  4. 

17)  INPA. 

MSL:  4. 

RSL:  4. 

18)  HCM1,  HCM2,  HCM3. 

MSL:  3. 

RSL:  Unchanged. 

These  commands  all  use  the  current  distance  matrix  to  perform 
agglomerative  cluster  analysis.  They  generate  a "clustering  trace" 
in  the  form  of  a hierarchical  tree  which  shows  how  nodes  are  successively 
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clustered  together;  the  tree  may  be  optinally  printed.  The  successive 
partitions  produced  by  this  trace  are  evaluated  and  the  best  one 
retained  as  "current  partition". 

The  commands  differ  on  the  cluster  analytic  method  employed, 
regarding  the  criteria  used,  at  a given  point  in  the  node  merging 
process,  to  decide  what  clusters  to  merge  next,  as  follows: 

• Method  1 (HCM1)  merges  the  "closest"  pair  of  clusters, 

measuring  the  distance  between  two  clusters  A and  B by  the 
mean  of  the  distances  between  nodes  of  A and  nodes  of  B;  that 

is, 

d(A,B)  = - — - — Zs(a,b)  , 
nAnB 

where  n,  and  n„  are  the  number  of  elements  in  A and  B 
A B 

respectively,  and  the  summation  is  over  all  the  elements  a £ A 
and  b £ B. 


Method  2 (HCM2)  merges  the  pair  of  clusters  that  lead  to  the 
minimum  mean  of  the  distances  between  all  pairs  of  nodes  in 
the  cluster  resulting  from  the  merger.  That  is,  at  each 
step  it  chooses  to  merge  the  two  clusters  that,  once  merged, 
produce  the  minimum  x: 


where  n^  is  the  number  of  nodes  in  the  merger  A;  a, a'  e 
the  summation  is  over  aj_l  pairs  of  nodes  (main  diagonal 
of  the  distance  matrix  S are  included  in  it). 


A and 
entries 
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Method  3 (HCM3) , finally,  merges  the  two  clusters  A and 
B that  lead  to  the  minimum  y_: 

y = — — Es(a.a')  - — Es(a,a')  - — Es(b,b')  , 
nAnB  nA  nB 

where  the  first  summation  is  over  all  pairs  of  nodes  in 

A and  B,  the  second  over  all  pairs  in  A,  and  the  third  over 

all  pairs  in  B. 


1.9)  EVAL. 

MSL : 2. 

RSL:  Unchanged. 

The  current  partition  is  measured  and  the  result  printed  out,  along 
with  its  components,  "strength"  and  "coupling". 

20)  PRLK. 

MSL:  2. 

RSL:  Unchanged. 

The  links  joining  nodes  belonging  to  different  subgraphs  in  the  current 
partition  are  printed  out.  This  is  helpful  to  figure  out  how  the 
components  of  the  current  partition  interact. 

21)  PRCL. 

MSL:  2. 

RSL:  Unchanged. 

22)  PRDI. 

MSL:  3. 


■A 


RSL:  Unchanged. 
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23)  PRSI . 

MSL : 4. 

RSL:  Unchanged. 

24)  SAVE. 

MSL:  1. 

RSL:  Unchanged. 


The  user  is  prompted  to  specify  a 
status  will  be  saved.  The  file  name  may 
beginning  with  a letter.  Depending  upon 
following  is  saved  away  in  the  specified 


file  name  where  the  current 
be  up  to  6 characters  long, 
the  current  status  level,  the 
file: 


• Status  level  1:  Current  adjacency  matrix 

• Status  level  2:  Current  adjacency  matrix  plus  current 

partition . 

• Status  level  3 Current  adjacency  matrix,  plus  current 
or  4 : 

distance  matrix,  plus  any  preclustering  that 
might  have  occurred  while  computing  the 
distance  matrix.  The  similarity  matrix  is 
never  saved  because  its  recomputation 
("SIMA")  is  very  fast. 


25)  DF.FI. 

MSL:  0. 


RSL:  Unchanged. 


i 


The  specified  file  is  deleted. 


'4 

■ 
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26)  QUIT. 

MSL:  0. 

Work  session  is  ended.  Current  status  is  lost  if  not  previously  saved. 
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3.  A Sample  Session 

This  section  illustrates  the  usage  of  the  package  with  a simple 
graph.  We  work  with  the  graph  depicted  in  Figure  1. 


Figure  1 


»• 


The  first  thing  we  do  is  to  enter  the  graph  structure  through  a "ENGR" 
command : 

f NGR 

* / > / | ! '2  , A i A / A.  * ,:  j r J / 3 f A / Tj  <6  / $ 

NEW  GRAPH  ENTERED. 

At  this  point,  a current  adjacency  matrix  is  available.  It  is  often 
safe  to  display  it  ("PRAD")  and  to  request  information  about  the  current 
graph  so  as  to  check  that  the  entered  graph  is  indeed  the  correct  one: 

re  a: 

PRAD 


CURRENT  ADJACENCY  MATRIX? 
1 2 3 4 w 6 


1 ! 

2 ! 
3! 

4 ! 

! 

t » : 


i 

l 

i 

i 

o 

0 


1 

1 

0 

0 

1 

1 


1 

0 

1 

1 

t> 

0 


1 

0 

1 

1 

0 

0 


0 

1 

0 

0 

1 

1 


0 

1 

0 

0 

1 

1 


•4 


red: 

f'ENO 
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C U R REM!  NUMBER  0 , N 0 D Er  S J 6, 


RE  D : 

NOLK 

RECORD!  M I INKS. 

FROM  NODE  TO  NODE (S) 5 


.1. 

( 

3 } 

2 . 

3 1 

A y 

p 

( 

3 ) 

1 • 

!j  f 

6 y 

( 

2) 

1 , 

A r 

A 

( 

2 ) 

1 r 

3 r 

cr 

( 

2) 

6 f 

6 

( 

2) 

2f 

cr 

,.  .l  y 

(AVERAGE  NO.  OF  LINKS  PER  NODE  I 2.333). 

RED  5 
I SOL 

NO  ISOLATED  NODES. 

Using  "CLON",  we  next  specify  that  we  wish  to  perforin  preliminary 
clustering  (a  value  of  p = 1 is  assumed  by  the  system) : 

RE  0 : 

CLON 

*•  F'  R E C L U S 1 E R 1 N G R E R E 0 R M E D W I T H P “ 1 . 

To  display  the  current  clustering  (partition),  we  use  "PRCL": 

red: 

PRCL 


CLUSTER  (NO)  OBJECTS 


1 (1)1 

2 (1)2 

3 ( 2)  3 A 

A (2)  5 6 


We  can  also  evaluate  it  ("EVAL") : 


red: 

EVAL 


S TEL  noth: 
coupling: 

MEASURE : 


O.OOOOr 
3 • O0<  >0  • 
-3 . 000 . 


red: 


“J 
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Next,  we  can  compute  a distance  matrix  treating  collapsed  nodes 
not  as  single  nodes  ("DIMN"),  display  it  ("PKDI") , compute  a similarity 
matrix  from  it  ("SIMA") , display  this  matrix  ("RPSI") , compute  an 
initial  partition  with  a percentage  parameter  of  80%  ("INPA")  which 
can  then  be  evaluated  ("EVAL") : 


1 1 IMN 

( P R E C L U S r E R I N G C 0 M P L E T E > 

P R E C I LJ  5 r : . ! I N G I ■ E I ■'  1 : 0 K M E D A N D D 1 S T A N C E M A T R I X C 0 M P U T E D U I T H P 
C L U S I E R S N 1 3 1 1 f , K E N A S S I r ! G L.  E N 0 1»  E S , 


RE  a: 
PR  D I 


CURE!:  NT  DISTANCE  MATRIX? 


i 

') 

3 

4 

i : 

< ' . < ■ 

0,67 

0.25 

0.03 

r)  i 

i 

0,67 

0 , 00 

0 . 83 

0.25 

3 ! 

0.2b 

0.83 

0 . 0 0 

1 . 00 

4 : 

0.83 

0 . 35 

1 . 00 

0 . 00 

rid  : 

S I It  A 


5 I MILA R IT  Y h A T R IX  C 0 M P U T E D . 


red: 

PR  3 1 

C U R R E NT  S I M 1 L A k 1 I Y M A T R 1 X ? 


1 

2 

3 

4 

i : 

1 . 

00 

0. 

33 

0, 

, 75 

0. 

17 

'•>  • 
i 

0. 

33 

1 . 

00 

0, 

. 17 

0. 

75 

3! 

0 . 

75 

0. 

17 

1 . 

.00 

0. 

00 

4 ! 

0, 

17 

0. 

75 

0. 

.00 

1 . 

00 

red: 

INPA 

E N 1 E R PERCENT  A C E P A R A M E T [•  R ? 

80. 

INITIAL  PARTITION  COMPUTEI'  WITH  P = 80.00  X, 
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reg: 

F'RCL 

CLUSTER  (NO)  0 EJECTS 


1 (3)13  9 

2 (3)  2 5 6 

reg: 

EVAL 

STRENGTH:  0.6667, 

coupling:  o.li.  u. 

MEASURE:  0.056. 

i eg  : 


To  save  the  current  status,  we  use  "SAVE"  (note  that  a checking 
is  performed  so  as  not  to  destroy  an  existing  file  if  the  user  doesn't 

want  to) : 


SAVE 

ENTER  FILE  NAME i 
SIXNOD 

FILE  SIXNOD  ALREADY  EXISTS. 
DO  YOU  WAN  T TO  DEC (ROY  IT  ? 

NO 

ENTER  EIRE  NAME.* 

N06ALL 

STATUS  SAVED  IN  FILE  NO 6 ALL 


A distance  matrix  can  also  be  computed  by  taking  collapsed 


nodes  as  single  nodes  ("DIMS") : 
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REQ : 

DIMS 

(PREC  LUST  ERIN1!  > f AM  PI.  ETE) 

PPFCLUSl  FR1  f •{ I 1 ! I'T  UP  it  1 AMU  DISTANCE  MATRIX  COMPU 1 1 1<  W1  t ! I P ~ 1-  » 
C l U s TER s T A b ! : N AS  • I.  N u ! NON E S . 

REG  5 
PR  Ii  I 

C U R R L N T D I S I A ML E M A I R 1 X : 

1 2 3 4 


i : 

0 

00 

0.50 

0 

0 , 75 

m ! 

0 

so 

0 . 00 

0 

•f  1'.' 

0.33 

3 ! 

0 

33 

0.75 

0 

00 

1 . 00 

4 : 

0 

75 

0 . 33 

.! 

00 

0 ♦ 0 0 

We  now  use  the  distance  matrix  saved  previously  to  perform 
cluster  analysis.  Fisrt,  we  restore  the  status  saved  above  ("REST"): 


reo: 

REST 

ENTER  FILE  NAME l 
NO 6 ALL 

ADJACENCY  MATRIX  . PR E CLUSTER I NO  AND  DISTANCE  MATRIX  READ  FROM  FILE  NUcALL 

REG : 

PRD  I 


C IJ  R R E N T D I S T A N ( ’ E MATRIX: 


1 

■p 

3 

4 

1 ! 

0.00 

0.67 

0 . 25 

0.83 

r>  • 
d..  1 

0.67 

0.00 

0.83 

0.25 

3! 

0.25 

0.83 

0.00 

1 . 00 

4 : 

0.03 

0.25 

1 . 00 

0 . 00 
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With  the  distance  matrix  shown  above,  we  perform  hierarchical 
cluster  analysis  by  means  of  the  commands  "HCM1",  "HCM2"  and  HCM3" . 

Note  that  the  system  informs  the  user  of  the  measure  associated  with 
the  best  identified  partition,  and  gives  him  a chance  to  print  the 
hierarchical  clustering  trace  (tree);  the  best  identified  partition 
becomes  the  "current  partition",  which  can  therefore  be  printed  through 
"PRCL": 


per: 

HCM1 

1 PARTITION  MEASURE J 0.556 
DO  YOU  WAN!  TO  PRINT  THE  TREE? 

NO 


EEC): 


PRCL 

C L.  U S 1 E R ( N 0 ) 0 B J E C T S 

1 ( 3 ) 3.  3 4 

2 (3)  2 5 A 


req: 

HCH2 

BEST  PAR T I T 1 ON  MEASURE t 0 , 556 
DO  YOU  WAN  f TO  PR  INT  THE  TREE? 

NO 

REQJ 

PRCL 

CLUSTER  (NO)  OBJECTS 


1 ( 3)  1 3 4 

2 ( 3)  2 5 A 

REO: 

HCM3 

B E S T P A R T I T 1 0 N Ml  A S U I i E : 0.0 5 A 


DO 

YOU  WA' 

LI  T O P 

Rl  NT 

THE  TE! 

YES 

SEI 

I'  PAPER 

AND  PR 

ESS 

return: 

I 
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If  the  user  wants  to  print  the  tree,  it  comes  out  as  shown  below: 


lc  vt'L : 


3 ' A 7 :i  '?  10  i 13  14  It;  16  1 ? 10  19  20  L’l  22  23  24 


< * 1 * ) — * 

1 ; 

<*  2*>  * 


collapsed  objects: 
< * 1 * > : 5 6 


measures: 

-1.000  0.5SS  0.133 


Note  that  objects  (nodes)  collapsed  while  computing  the  distance 
1 matrix  are  treated  as  single  nodes  during  the  clustering  process,  as 

determined  by  the  fact  that  only  one  entry  in  the  distance  matrix 
exists  for  each  group  of  nodes,  but  that  information  is  given  regarding 
what  nodes  belong  to  each  group. 

■ The  measures  shown  below  the  tree  correspond  to  the  partitions 

resulting  at  each  clustering  step,  from  left  to  right  in  the  printed 
tree.  In  the  example  above,  for  instance,  -1.00  is  the  measure 
associated  with  the  partition  {2 , 5 , 6} , { 1 } , (3 , 4 } ; 0.556  with  {2 , 5 , 6} , { 1 , 3,4 } 
and  0.133  with  { 1 , 2 , 3, 4 , 5 ,6} . 

* 

i 


*< 
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At  this  point,  the  following  observation  must  be  made  regarding 
the  format  of  the  printed  tree.  Asterisks  ("*")  are  printed  to  indicate 
when  a cluster  ends,  so  as  to  facilitate  the  reading  of  the  partition 
associated  with  a given  level  in  the  clustering  trace,  as  follows: 

Given  a level  in  the  trace  (say  level  10  in  the  tree  shown  above), 
the  partition  associated  with  this  level  has  as  many  components 
(subgraphs)  as  horizontal  lines  would  be  crossed  by  a vertical  line 
joining  the  level  numbers  at  the  top  and  bottom  of  the  diagram  (two 
in  the  case  above);  to  read  off  the  nodes  belonging  to  each  subgraph, 
one  backtracks  from  the  point  at  which  each  horizontal  line  is  crossed 
(from  right  to  left)  and  looks  for  the  bottom-most  asterisks  in  any 
previous  cluster.  This  strategy  would  result  in  the  partition 
{2,5,6}, {1,3, A}  for  the  example  above. 

Another  observation  regarding  the  printed  tree  is  that  only 
25  clustering  levels  are  printed  in  any  case,  due  to  space  constraints. 
Level  values  determined  by  the  clustering  level  employed  are  rounded 
off  and  assigned  to  the  closest  level  of  the  25  printed.  This  can 
result  in  clusterings  involving  more  than  two  subclusters  for  output 
purposes,  which  would  seem  to  contradict  the  basic  assumption  of  any 
hierarchical  clustering  method;  it  should  be  understood  that  such 
contradiction  is  only  apparent,  the  clustering  process  proceeds  always 
one  step  at  a time. 

"PRLK"  allows  to  print  the  links  exisLing  among  subgroups  in 
the  current  partition: 
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RED : 

PRCL 

c: LUSTER  (NO)  OBJECTS 

1 ( 3)  1 3 4 

? (3)  2 5 6 

red: 


F’RLK 

l l MR.  • BE  T WEEN  CLUSTERS  1 X 
1 - 2 


(See  Figure  1 and  check  that  the  printed  link  — from  node  1 to  node 
2 — is  the  only  one  between  the  subgraphs  {1,3,4}  and  {2,3,6}). 

To  illustrate  the  use  of  "SIDI"  we  recompute  the  similarity 
matrix  associated  with  the  current  distance  matrix  (recall  that  the 
similarity  matrix  is  never  saved) : 
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peq: 

S 1 MA 


SIMILARITY  MATRIX  COMPUTED* 


rfo  : 

S J U I 

III  STANCE  hATh'IX  COMPUTED  FROM  SIMILARITY  MATRIX* 
< NO  CL  OS  TER1NG  PC  WORMED  ) . 

req: 

PRD  I 


CURRENT  D I ST ANCE  MAT R I X J 

1 2 3 4 


i : 

0 . 00 

0,71 

0, 33 

0 , 8 1 

'> 1 

0.71 

0.00 

0 , 8 1 

0 . 3 3 

3 1 

0.33 

0 . 8 1 

0 , 00 

0.90 

4 ! 

0,81 

0 * 3 3 

0,90 

0.00 

The  iterative  procedure  proposed  in  [Andreu  77]  can  be  performed 


by  means  of  "ITER",  and  clustering  performed: 
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pro: 

i r 

l:  m i r: r numbe  r or  i n ka  r i ons  : 

i 

epsilon : 

0,001 

C L U S 1 R I N G A T III  I K A 1 1 0 N 1 i.!  I 
CLUSTER  <NU)  OBJECTS 

1 ( 3 ) I 5 4 

2 (3)  2 5 6 

CONTINUE? 

NO 

REG : 

PR  m 

t:  u R R LNl  DIET  A N C E M A T R I X J 

1 2 

1!  0,00  0.43 
2!  0.43  0.00 


We  now  save  the  current  status  in  file  "SIXNOD".  Since  this 
file  flready  exists  (see  above),  we  first  delete  it  through  "DEFI" 
(alternatively,  we  could  specify  that  we  wish  to  destroy  its  current 
version  after  issuing  a "SAVE"  command,  see  above): 
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rfq: 

DEI"  I 

F N I LI'  I ILL  NAME: 

S 1 'NOD 

FILL  SJXNOH  DELETED . 

Ei:n : 

SAVE 

ENTER  FILE  NAME: 

SIXNOD 

STATUS  SAVED  IN  FILE  SIXNOD 
REG  i 
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ADI  K 

/2J3/$ 

GRAPH  UPDATED . 
rph: 

NO'  K 

RECORDED  LINKS. 

FROM  Null i W mn:iE(5)l 

1 ( 3 ) 2 » 

2 ( 4 ) 1 r 

3 ( 3)  J - 

4 ( 2 ) 1 r 

5 ( 2 ) 2 r 

6 ( 2 ) 2 . 

(AVERAGE  NO.  OF  LINKS  REN  NOD!:;  2.66  ). 


Next,  we  investigate  how  the  resulting  distance  matrix  differs 
from  that  obtained  above  through  "ITER".  Note  that  since  we  updated 
the  current  graph,  the  initial  distance  matrix  must  be  recomputed;  we 
do  it  through  "DIMN"  below  and  see  that  a different  preclustering 
than  that  obtained  before  results: 
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req: 

ITER 

UN A DLL  FG  DO  IT, 

RErQ  t 
DIMN 

( FREE  LUST  I RING  COMPLETE::) 

PRECLUG  . I V I NG  IT  TOR  MED  AND  D I STANCE  MATRIX  CGMPUILD  WITH  P 
CLUSTERS  NO  I I AM  N AG  > INGLE  NODES, 

ret: 

PR  CL 

CLUSTER  (NO'  OBJECTS 


(2)  1 3 

( 1 ) 2 
( 1 > 4 

< 2 ) 5 6 

pro : 

PR  BI 

C U R R C.  N I D I S T A N C E M . i T R I X : 


i 

' ) 

~z 

\J 

4 

i : 

0.00 

0.50 

0,25 

0.83 

n • 

i 

0.50 

0 . 00 

0.67 

0.40 

31 

0.25 

0 . 4 7 

0 . 00 

1 , 00 

4 : 

0.83 

0.40 

1 . 00 

0.00 

red: 


Now  we  can  use  "ITER”  (note  that  intermediate  clusterings  differ 
from  those  obtained  previously,  but  that  the  end  partition  is  the  same 
and  that  the  final  distance  matrix  reflects  the  fact  that  the  "coupling" 
between  the  two  subgraphs  in  the  final  partition  is  now  higher): 


- 
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IT  Eft 

E N T r K N U M I- : E R 0 F I T E R A 7'  1 0 N S : 
100 

EF'SII  ON : 

0.001 

CLUSTER  INC  A!  .ITERATION  15  J 

CLUSTER  (NO)  OBJECTS 

( 3 ) 1 3 J 

( 1 ) 2 
( 2 ) 5 6 

FONT INUE? 

YES 

C L U 3 T i:  R INC  A T 1 7 E R A T 1 0 N 1 6 J 

CLUSTER  (NO)  OBJECTS 

(3)  !.  3 A 

(3)  2 5 6 

CONTI NUE? 

NO 

red: 

PRD  I 

C U R R L N T D I S I A N C t . MATRIX: 

1 2 

1!  0.00  0.13 
2!  0.13  0.00 


Next,  we  illustrate  the  use  of  "DELK"  and  "DENO"  to  transform 
the  current  graph  (Figure  2)  into  that  shown  in  Figure  3. 


31  - 


Figure  3 Figure  4 

First,  we  delete  the  unneeded  links  and  add  the  new  ones: 

reg: 

PEEK 

♦ 

(3 RAF'!  I t.JP Ii AT'  D . 

peg  : 

ADLK 

« 

♦ 

/3?5/4?6/$ 

GRAPH  UP  PA  f'ED . 

To  obtain  the  graph  in  Figure  3,  we  must  delete  nodes  1 and  2. 
We  first  check  that  they  are  currently  isolated  nodes  ("ISOL") , and 
then  delete  them  ("DENO") : 
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rl:q  : 

ISOI 

I SOL A 1 CD 
1 


RE  0 : 


nodi  s: 


DENO 

/1 ,2/ 

THE  FOLLOW IMG  NODES  HAVE  KEEN  REMOVED i 

1 2 

NODE  S HAVE  REI  N RENAMED  AS  I ALLOWS: 

OLD  MO.  ML.  W NO. 

3 1 

4 2 

5 3 

6 4 

Note  that  the  remaining  nodes  were  re-numbered;  the  system 
always  numbers  nodes  starting  at  1.  So,  our  current  graph  is  now  that 
shown  in  Figure  4,  which  is  structurally  equivalent  to  that  in  Figure  3 
except  for  node  labelling.  The  structure  can  be  visually  checked 
through  "PRAD"  and  "NOLK": 
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REQ : 
PRAD 


CURKI.  f'T  .VI'U 


NCT  matrix: 


1 2 3 4 

i : i i i o 

2 : i i o i 

3 : j o L 1 

4 ! 0 i.  1.  1 


rc:o  : 

NDLK 


RFC  OR  DU:  LINKS  . 

from  no  hi:  ro  Nont  <S) : 


1 ( 2 ) 2 * 3 » 

2 ( 2 ) 1 . 4 > 

3 ( 2 ) i . 4 r 

4 ( 2 ) 2 . 3 . 


( AVERAGE  NO,  01 


FINKS  PFR  NO: if: 


2.000) 


Any  available  algorithm  can  now  be  applied  to  this  graph.  For 
example,  we  can  compute  a distance  matrix: 


req: 

niMN 

( P R F C L 0 S I F R I N G C C M R I.  F T F ) 

NO  PREl  FUG  1FRING  PERFURhED » DISTANCE  MATRIX  COMPUTED  WITH  P 

req: 

PRD  I 

CURRENT  D I ST  A NCI..  MATRIX.* 


l 

n 

3 

4 

i : 

0.00 

0.50 

0.50 

0.50 

o t 
*>•  1 

0.50 

0 . 00 

0.50 

0.50 

3 : 

0.50 

0.50 

6 

o 

0.50 

4 : 

0.50 

0.50 

0.50 

0.00 
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This  sample  session  has  illustrated  the  use  of  all  the  available 
commands.  We  thus  quit  ("QUIT"): 

reg: 

GU  I T 

nun. 

OK  t 
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With  the  aid  of  the  package  introduced  above,  we  computed  a 
distance  matrix  as  shown  below: 


t 


peo: 

E'NGR 

♦ 

*22/ 3 .2.3, 5 , 6 , 21 /2 i 3 , 5/3  5 5/4 i 1 7 , 1 8 , 21 , 22/5 >7, 15/659, 2 1 / 

♦ 

7?  13,  1 4.15/8:1 :!. . 12/9  5 1 1 . 12,  21/10?  11 . 12. 19.20/11  ? 1 9.20.22/ 

• 

1 3 • 1 4 / 14  5 1 5 / 1 6 5 1 7 .18. 2 2 / I 7 5 1 8 / 1 8 ? 2 2 / 2 1 5 2 2 / $ 

NEW  GRAPH  ENTERED. 

PEG : 

NOLK 

RE  < or '.Ilf  n L JNKS  . 

FROM  NODE  TO  NODE  (5): 


1 

( 

5 ) 

2. 

3 . 

cr 
sJ  * 

6 , 

21 , 

n 

( 

3) 

i . 

hi 

5 f 

3 

( 

3) 

i . 

2 . 

5 , 

4 

( 

4 ) 

17. 

1 8 • 

21 , 

22. 

5 

( 

5) 

1 . 

2. 

3 , 
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The  rows  and  columns  of  the  distance  matrix  on  page  38  are 
numbered  according  to  the  assigned  cluster  nodes  shown  on  page  37. 

We  use  two  main  approaches  for  partitioning  the  graph,  as 
fol lows : 

1)  The  three  hierarchical  cluster  analysis  methods  (available 
in  the  package  through  "HCM1",  "HCM2"  and  "HCM3")  are 
used,  and 

2)  We  employ  the  iterative  procedure  available  through  "ITER" 
to  successively  partition  the  graph. 

The  results  are  presented  in  the  next  two  sections. 

4.1.  Cluster  analysis  approach. 

The  results  obtained  with  "HCM1",  "HCM2"  and  "HCM3"  are  shown 
in  the  next  three  pages,  where  the  hierarchical  clustering  trees  (traces) 
are  depicted. 

As  can  be  seen,  all  three  methods  produced  the  same  graph 
partition  (Figure  '5),  which  coincides  with  that  presented  in  [Andreu 
& Madnick  77]. 
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4.2.  Iterative  approach . 

Starting  with  the  distance  matrix  shown  on  page  38,  we  used 
"ITER"  and  "DENO"  to  successively  partition  the  graph.  The  first 
steps  of  this  procedure  are  illustrated  below: 
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